Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding support for ARB (Application Resource Bundle) (.arb) format #338

Merged
merged 2 commits into from
Jun 25, 2024

Conversation

dsavinov-actionengine
Copy link
Contributor

Problem and/or solution

Adding parsing and compiling for ARB (Application Resource Bundle) (.arb) format

How to test

1. Running unit-tests

/openformats/tests/formats/arb/test_arb.py contains tests for arb
Use pytest /openformats/tests/formats/arb/test_arb.py to run tests

2. Through testbed

Use "ARB" handler in the testbed

Reviewer checklist

Code:

  • Change is covered by unit-tests
  • Code is well documented, well styled and is following best practices
  • Performance issues have been taken under consideration
  • Errors and other edge-cases are handled properly

PR:

  • Problem and/or solution are well-explained
  • Commits have been squashed so that each one has a clear purpose
  • Commits have a proper commit message according to TEM

@dsavinov-actionengine
Copy link
Contributor Author

Tagging @kbairak for review, but feel free to tag others if needed

Copy link
Member

@kbairak kbairak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job!

self.existing_keys.add(key)

if isinstance(value, DumbJson):
self._find_keys(value, key)
Copy link
Member

@kbairak kbairak Jun 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of going infinitely deep, we could instead search up to level two and collect entries that are non-text. This way we don't have to bother with escaping keys etc. Here is an idea; I haven't tested it so there might be a few errors:

class ArbHandler(JsonHandler):
    ...

    def parse(self, content, **kwargs):
        ...

        keys = self._find_keys(parsed)
        stringset = self._extract(parsed, keys)

    def _find_keys(self, parsed):
        if parsed.type != dict:
            raise ParseError("...")

        keys, keys_to_ignore = set(), set()
        for key, key_position, value, _ in parsed:
            if not key.startswith("@"):
                if not isinstance(value, str):
                    continue
                if key in keys:
                    raise ParseError(
                        f"Duplicate string key ('{key}') in line "
                        f"{self.transcriber.line_number}"
                    )
                keys.add(key)
            else:
                if isinstance(value, DumbJson):
                    (inner_value, _), = value.find_children('type')
                    if inner_value != "text":
                        keys_to_ignore.add(key)
        return keys - keys_to_ignore

    def _extract(self, parsed, keys):
        stringset = []
        for key, _, value, value_position in parsed:
            if key not in keys:
                continue
            ...
        return stringset

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did not try this. Anyway, ARBs normally have only two levels of json, so we don't expect infinite nesting.

else:
raise ParseError("Invalid JSON")

def compile(self, template, stringset, **kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the only reason you re-define compile and _copy_until_and_remove_section is so that you can add the keep_sections functionality, which is only needed for testing. I am not very fond of this idea. It makes the code unnecessarily complex. However, there is a case where not removing the sections makes sense.

This is not adequately described in the docs, but when Transifex invokes this handler, it will pass is_source as a keyword argument. In this case, we can be certain that

  1. the stringset will not be missing anything and that
  2. the user wants to get the exact same file they previously uploaded

So, if this is indeed the same use-case, maybe you can rename keep_sections to is_source.

In any case, lets make sure that when is_source=True is passed to compile, the end result is the same as the originally uploaded file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried this, replaced keep_sections with is_source, but that did not work, unit-tests failed. Anyway, re-definition of the compile() method is necessary now, as it handles language code for ARB.

raise ParseError("Invalid JSON")

def _extract(self, parsed):
if parsed.type == dict:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for this check, we already did it in _find_keys. Perhaps to make it clearer, lets make one check in parse and then in _find_keys and _extract assume that parsed represents a dict.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@kbairak
Copy link
Member

kbairak commented Jun 12, 2024

There is another issue I am concerned about. Looking at the examples of the official Flutter documentation, it looks like the language files are supposed to be more bare: only key-translation pairs without any metadata.

image

So, the question is: should we follow this? should we make sure that when is_source=False during compile, only the key-translation pairs should be present while when is_source=True we should make sure that they get back the same file they originally uploaded?

Have you looked into this?

@dsavinov-actionengine
Copy link
Contributor Author

There is another issue I am concerned about. Looking at the examples of the official Flutter documentation

...

should we make sure that when is_source=False during compile, only the key-translation pairs should be present while when is_source=True we should make sure that they get back the same file they originally uploaded?
Have you looked into this?

Did not implement this logic in the latest commit. It looks unusual if compared to other openformats handlers, which have their compiled files as similar as possible to the source files (only strings from stringset change). Maybe we could try to get back to this idea in a separate pull-request.

continue

context_key = f"@{key}.context"
context_value = self.metadata[context_key] \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented issue 1 from QA spreadsheet

)
new_template = self._clean_empties(new_template)

if language_info is not None:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented issue 2 from QA spreadsheet

context_value = self.metadata[context_key] \
if context_key in self.metadata.keys() else ""
description_key = f"@{key}.description"
description_value = self.metadata[description_key] \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented issue 3 from QA spreadsheet

@kbairak kbairak enabled auto-merge June 25, 2024 13:26
@kbairak kbairak merged commit 385d505 into transifex:devel Jun 25, 2024
3 checks passed
@txsentinel txsentinel mentioned this pull request Jun 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants